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1  Project  Overview 

We  propose  to  build  an  integrated  physical  humanoid 
robot  including  active  vision,  sound  input  and  output, 
dextrous  manipulation,  and  the  beginnings  of  language, 
all  controlled  by  a  continuously  operating  large  scale  par¬ 
allel  MIMD  computer.  This  project  will  capitalize  on 
newly  available  levels  of  computational  resources  in  or¬ 
der  to  meet  two  goals:  an  engineering  goal  of  building 
a  prototype  general  purpose  flexible  and  dextrous  au¬ 
tonomous  robot  and  a  scientific  goal  of  understanding 
human  cognition.  While  there  have  been  previous  at¬ 
tempts  at  building  kinematically  humanoid  robots,  none 
have  attempted  the  embodied  construction  of  an  au¬ 
tonomous  intelligent  robot:  the  requisite  computational 
power  simply  has  not  previously  been  available. 

The  robot  will  be  coupled  into  the  physical  world  with 
high  bandwidth  sensing  and  fast  servo-controlled  actua¬ 
tors,  allowing  it  to  interact  with  the  world  on  a  human 
time  scale.  A  shared  time  scale  will  open  up  new  possi¬ 
bilities  for  how  humans  use  robots  as  assistants,  as  well 
as  allowing  us  to  design  t  he  robot  to  learn  new  behaviors 
under  human  feedback  such  as  human  manual  guidance 
and  vocal  approval.  One  of  our  engineering  goals  is  to 
determine  the  architectural  requirements  sufficient  for  an 
enterprise  of  this  type.  Based  on  our  earlier  work  on  mo¬ 
bile  robots,  our  expectation  is  that  the  constraints  may 
be  different  than  those  that  are  often  assumed  for  large 
scale  parallel  computers.  If  ratified,  such  a  conclusion 
could  have  important  impacts  on  the  design  of  future 
sub-families  of  large  machines. 

Recent  trends  in  artificial  intelligence,  cognitive  sci¬ 
ence,  neuroscience,  psychology,  linguistics,  and  sociol¬ 
ogy  are  converging  on  an  ant.i-objectivist.  body-based 
approach  to  abstract  cognition.  Where  traditional  ap¬ 
proaches  in  these  fields  advocate  an  objectively  speci¬ 
fiable  reality — brain-in-a-box,  independent  of  bodily 
constraints — these  newer  approaches  insist  that  intelli¬ 
gence  cannot  be  separated  from  the  subjective  expe¬ 
rience  of  a  body.  The  humanoid  robot  provides  the 
necessary  substrate  for  a  serious  exploration  of  the 
subjectivist  — body-based — hypotheses. 

There  are  numerous  specific  cognitive  hypotheses  that 
could  be  implemented  in  one  or  more  of  the  humanoids 
that  will  be  built  during  the  five-year  project.  For  ex¬ 
ample,  we  can  vary  the  extent  to  which  the  robot  is  pro¬ 
grammed  with  an  attentional  preference  for  some  images 
or  sounds,  and  the  extent  to  which  the  robot  is  pro¬ 
grammed  to  learn  to  selectively  attend  to  environmental 
input  as  a  by-product  of  goal  attainment  (e.g.,  success¬ 
ful  manipulation  of  objects)  or  reward  by  humans.  We 
can  compare  the  behavioral  result  of  constructing  a  hu¬ 
manoid  around  different  hypotheses  of  cortical  represen¬ 
tation.  such  as  com  addict  detection  versus  interpolat¬ 
ing  memory  versus  sequence  seeking  in  counter  streams 
versus  time-locked  multi-regional  re troeich ration .  In  the 
later  years  of  the  project  we  can  connect  with  theories 
of  consciousness  by  demonstrating  that  humanoids  de¬ 
signed  to  continuously  act  on  immediate  sensory  data 
(as  suggested  by  Dennett's  multiple  drafts  model)  show 
more  h>»man-like  behavior  than  robots  designed  to  con¬ 
struct  an  elaborate  world  model. 


Fin'  act  of  building  and  programming  behavior-based 
robots  will  force  us  lo  face  not  only  issues  of  interfaces 
between  traditionally  assumed  modularities,  but  even 
the  idea  of  modularity  itself.  By  reaching  across  tradi¬ 
tional  boundaries  and  tying  together  many  sensing  and 
acting  modalities,  we  will  quickly  illuminate  shortcom¬ 
ings  in  the  standard  models,  shedding  light  on  formerly 
unrealized  sociologically  shared,  but  incorrect,  assump¬ 
tions. 


2  Background:  the  power  of  enabling 
technology 


An  enabling  technology  sucli  as  the  brain  that  we  will 
build  has  the  ability  to  revolutionize  science.  A  recent 
example  of  the  far-reaching  effects  of  such  technologi¬ 
cal  advances  is  the  field  of  mobile  robotics.  Just  as  the 
advent  of  cheap  and  accessible  mobile  robotics  dramat¬ 
ically  altered  our  conceptions  of  intelligence  in  the  last 
decade,  we  believe  that  current  high-performance  com¬ 
puting  technology  makes  the  present  an  opportune  time 
for  the  construction  of  a  similarly  significant  integrated 
intelligent  system. 

Over  the  last  eight  years  there  lias  been  a  renewed 
interest  in  building  experimental  mobile  robot  systems 
that  operate  in  unadorned  and  unmodified  natural  and 
unst  ruct  ured  environments.  The  enabling  technology  for 
this  was  the  single  chip  micro-computer.  This  made  it 
possible  for  relatively  small  groups  to  build  serviceable 
robots  largely  with  graduate  student  power,  rather  than 
the  legion  of  engineers  that  had  characterized  earlier  ef¬ 
forts  along  these  lines  in  the  late  sixties.  The  accessibil¬ 
ity  of  this  technology  inspired  academic  researchers  to 
take  seriously  the  idea  of  building  systems  that  would 
work  in  the  real  world. 

The  act  of  building  and  programming  behavior-based 
robots  fundamentally  changed  our  understanding  of 
what  is  difficult  ami  what  is  easy.  The  effects  of  this 
work  on  traditional  artificial  intelligence  ran  be  seen  in 
innumerable  areas.  Planning  research  lias  undergone 
a  major  shift  from  static  planning  to  deal  with  •'reac¬ 
tive  planning."  The  emphasis  in  computer  vision  has 
moved  from  recovery  from  single  images  or  canned  se¬ 
quences  of  images  to  active-  or  animate  -  vision,  where 
the  observer  is  a  participant  in  the  world  controlling  the 
imaging  process  in  order  to  simplify  the  processing  re¬ 
quirements.  Generally,  the  focus  within  A1  has  shifted 
from  centralized  systems  to  distributed  systems.  Fur¬ 
ther.  the  work  on  behavior-based  mobile  robots  has  also 
had  a  substantial  effect  on  many  other  fields  (e.g..  on  the 
design  of  planetary  science  missions,  on  silicon  micro- 
machining.  on  artificial  life,  and  on  cognitive  science).. 
There  has  also  been  considerable  interest  from  neuro¬ 
science  circles,  and  we  are  just  now  starting  to  see  some 
bi-directional  feedback  there. 


The  grand  challenge  that  we  wish  to  take  up  is  to. 
make  the  quantum  leap  from  experimenting  with  1110-. 
bile  robot  systems  to  an  almost  humanoid  integrated 
head  system  with  sarcading  fovea  ted  vision,  facilities" 
for  sound  processing  and  sound  production,  and  a  com¬ 
pliant.  dextrous  manipulator.  The  enabling  technol-  Qodtfs 


T  .USTbi'ifiCTfiiX)  * 


Mat 


Avail  and/or 
Speoial 


□  □ 


3.1  Mimls 


ogy  is  massively  parallel  computing:  our  brain  will  have 
large  numbers  of  processors  dedicated  to  particular  sub- 
functions,  and  interconnected  by  a  fixed  topology  net¬ 
work 

3  Scientific  Questions 

Building  an  android,  an  autonomous  robot  with  hu¬ 
manoid  form,  has  been  a  recurring  theme  in  science  fic¬ 
tion  from  the  inception  of  the  genre  with  Frankenstein, 
through  the  moral  dilemmas  infesting  positronic  brains, 
the  human  but  not  really  human  C3PO  and  the  ever 
present,  desire  for  real  humanness  as  exemplified  by  Com¬ 
mander  Data.  Their  bodies  have  ranged  from  that  of  a 
recycled  actual  human  body  through  various  degrees  of 
mechanical  sophistication  to  ones  that  are  indistinguish¬ 
able  (in  the  stories)  from  real  ones.  And  perhaps  the 
most  human  of  all  the  imagined  robots,  HAL-9000,  did 
not  even  have  a  body. 

While  various  engineering  enterprises  have  modeled 
their  artifacts  after  humans  to  one  degree  or  another 
(e.g.,  WAB0T-1I  at  Waseda  University  and  the  space 
station  tele-robotic  servicer  of  Martin-Marietta)  no  one 
has  seriously  tried  to  couple  human  like  cognitive  pro¬ 
cesses  to  these  systems.  There  has  been  an  implicit,  and 
sometimes  explicit,  assumption,  even  from  the  days  of 
Turing  (see  Turing  (1970)*)  that  the  ultimate  goal  of 
artificial  intelligence  research  was  to  build  an  android. 
There  have  been  many  studies  relating  brain  models  to 
computers  (Berkeley  1949),  cybernetics  (Ashby  1956). 
and  artificial  intelligence  (Arbib  1964).  and  along  the 
way  there  have  always  been  semi-popular  scientific  books 
discussing  the  possibilities  of  act  ually  building  real  'live' 
androids  (Caudill  (1992)  is  perhaps  the  most  recent). 

This  proposal  concerns  a  plan  to  build  a  series  of 
robots  that  are  both  humanoid  in  form,  humanoid  in 
function,  and  to  some  extent  humanoid  in  computat  ional 
organization.  While  one  cannot  deny  the  romance  of 
such  an  enterprise  we  are  realistic  enough  to  know  that 
we  can  but  scratch  the  surface  of  just  a  few  of  the  scien¬ 
tific  and  technological  problems  involved  in  building  the 
ultimate  humanoid  given  the  time  scale  and  scope  of  our 
proposal,  and  given  the  current  state  of  our  knowledge. 

The  reason  that  we  should  try  to  do  this  at.  all  is  that 
for  the  first  time  there  is  plausibly  enough  computation 
available.  High  performance  parallel  computation  gives 
us  a  new  tool  that  those  before  us  have  not  had  avail¬ 
able  and  that,  our  contemporaries  have  chosen  not  to  use 
in  such  a  grand  attempt.  Our  previous  experience  in 
attempting  to  emulate  much  simpler  organisms  than  hu¬ 
mans  suggests  that  in  attempting  to  build  such  systems 
we  will  have  to  fundamentally  change  the  way  artificial 
intelligence,  cognitive  science,  psychology,  and  linguis¬ 
tics  think  about  the  organization  of  intelligence.  As  a 
result,  some  new  theories  will  have  to  be  developed.  We 
expect  to  be  better  able  to  reconcile  the  new  theories 
with  current  work  in  neuroscience.  The  primary  bene¬ 
fits  from  this  work  will  be  in  the  striving,  rather  than  in 
the  constructed  artifact. 

'Different  sources  cite  1947  and  1948  as  the  time  of  writ¬ 
ing,  but  it  was  not  published  until  long  after  his  death. 


The  traditional  approach  taken  in  artificial  intelligence 
to  building  intelligent  programs  has  affectionately  been 
dubbed  Good  Old  Fashioned  AF,  or  GOFAI  (Haugeland 
1985).  It  is  epitomized  in  the  modularity  arguments  of 
lodor  (1983)  and  in  the  physical  symbol  system  hypoth¬ 
esis  of  Newell  A  Simon  (  1981).  These  approaches  reduce 
A1  to  the  problem  of  construct  mg  a  brain-in-a-box  sym¬ 
bolic  manipulator  which  would  act  intelligently  if  given 
appropriate  connection  to  a  robot  (or  other  percept uo- 
motor  system).  Still  further  modularization  leads  to  in¬ 
dependent  work  on  such  tasks  as  natural  language  pro¬ 
cessing.  planning,  learning,  and  c onimonsense  reasoning 
(e.g.,  Allen.  Hendler  A  Fate  (1990).  Hobbs  A  Moore 
(1985)  or  Brachman  A  Levesque  (1985)).  We  have  ar¬ 
gued  (Brooks  1991a)  that  much  of  GOFAI  was  shaped 
by  the  technological  resources  available  to  its  researchers. 
High  performance  computing  and  communications  gives 
us  a  new  opportunity  to  re-shape  attempts  at  building 
intelligent  systems. 

Many  modern  theories  are  at  odds  with  (IOFAl.  For 
example.  Minsky  (1986)  suggests  that  the  mind  is  a  soci¬ 
ety  of  smaller  agents  competing  and  cooperat  ing.  Kins- 
bourue  (1988)  and  Dennett  (1991)  argue  that  there  is 
no  place  in  the  brain  where  consciousness  resides.  Lin¬ 
guists  and  psycholinguists  have  argued  that  the  long- 
fashionable  separation  of  language  into  the  separate  com¬ 
ponents  of  grammar  and  semantics  is  a  fiction  convenient 
for  symbolic  formulation  but  not  useful  for  advancing  our 
understanding  of  the  real  diversity  of  language  phenom¬ 
ena  (Langacker  1987).  (Harris  1991).  Brooks  (1991n) 
proposes  that,  human-level  intelligence  can  be  built  with¬ 
out  a  single  central  representation  of  the  world.  Stein  (to 
appear)  argues  that  all  of  cognition  can  be  seen  as  the 
recapitulation — through  imagination — of  action  in  the 
world. 

Many  other  t  heories  of  mind  (e.g..  Searle  (1992).  Edel- 
man  (1987).  Edelman  (1989).  Edelman  (1992))  argue 
against  the  traditional  A1  notion  of  categorical  represen¬ 
tation.  and  instead  for  a  more  sit  uated  model  of  compu¬ 
tation.  Unfortunately  these  and  ot  hers  are  flawed  by  fun¬ 
damental  misunderstandings  about  the  nature  of  com¬ 
putation  and  the  uses  of  abstraction,  usually  centered 
around  formal  models  of  Turing  machines  and  somet  imes 
their  interaction  with  Godel's  theorem.  Such  arguments 
were  long  ago  successfully  debunked  (Arbib  1964).  but 
continue  to  resurface. - 

At  the  other  end  of  the  spectrum  is  connectionism. 
Computational  scientists  have  worked  with  simple  ab¬ 
stractions  of  the  brain  for  many  years  in  two  main 
waves,  one  in  the  sixties  (Rosenblatt  1962).  (Minsky 
A  Papert  1969)  and  a  second  in  the  eighties  (Rumel- 
hart  A  McClelland  1986).  Unfortunately,  most  of  this 
work  is  concerned  with  local  aspects  of  the  problem, 
rather  than  giving  insight  into  how  a  complete  system 

2 A  in ore  egregious  version  of  this  is  (Penrose  1989)  who 
not  only  makes  the  same  1  uring-iTodel  error  but  then  in  a 
desperate  attempt  to  find  the  essence  of  mind  and  applying 
the  standard  methodology  of  physics,  namely  to  find  a  sim¬ 
plifying  underlying  principle  resorts  to  an  almost  mystical 
reliance  on  quantum  mechanics. 


might  he  organized.-1  There  have  been  recent  attempts 
to  bridge  the  gap  in  more  serious  ways  between  com¬ 
putation  and  neuroscience  in  particular  Churehlaud  A 
Sejnowski  (1992)  but  still  the  gap  is  too  large  to  con¬ 
sider  neural-based  approaches  for  a  system  of  the  scope 
we  are  proposing.  Two  of  us  have  already  been  work¬ 
ing  together  (Dennett  Ac  Kinsbonrne  1992).  relating  a 
neuroscientific  theory  of  consciousness,  dominant  Joi  n* 
(Kinsbonrne  1988).  to  a  philosophical  analysis  of  mind 
A  major  intent  of  our  proposed  work  is  to  extend  that 
analysis  to  the  point  of  its  being  an  implementable  the¬ 
ory  on  our  humanoids. 

Recent  work  in  neuropsychology  has  produced  sur¬ 
prising  results.  Lesion  studies,  e.g.  those  by  Damasio 
Ac  Damasio  (1989)  and  (McCarthy  Ac  Warrington  1990). 
indicate  that  the  modularity  of  storage  and  access  in 
the  human  brain  is  dramatically  different  from  what  our 
intuitions— as  exemplified  by  both  cognitive  science  and 
(10FAI  tell  us.  For  instance  it  is  de;  r  that  a  picture  of 
a  dolphin  provides  immediate  access  to  a  different  set  of 
representat  ions  at  a  different  level  of  generalization  from 
those  prompted  by  the  verbal  stimulus,  "dolphin'.  In  a 
normal  person  these  representations  are  cross-linked,  but 
in  patients  with  certain  lesions  these  cross-links  may  be 
destroyed  for  particular  classes  of  entities  (e.g.,  for  an¬ 
imals.  but  not  tools).-1  Likewise  (Newcombe  A  Ratcliff 
1989)  demonstrate  multiple  parallel  channels  of  control 
dependent  on  the  task,  rather  than.  say.  a  single  cen¬ 
tralized  finger  control  module  for  each  finger.  There  is 
a  grounding  of  motor  control  in  the  different  types  of 
interactions  the  agent  has  with  the  world/’  Nor  is  the 
contro 1  of  attention  centralized,  as  illustrated  by  studies 
of  unilateral  neglect  (Kinsbourne  1987).  but  rather  it  is 
a  matter  of  competition  between  brain  systems. 

The  argument  is  that  the  human  brain  stores  things 
not  only  by  category  but  also  by  modality  -  the  'repre¬ 
sentations  '  are  grounded  in  the  sensory  modality  used 
to  learn  the  information.  Kuipers  A  Byun  (1991). 
Mataric  ( 19926)  and  Stein  (to  appear)  implement  limited 
forms  of  this  body-based  representation  in  mobile  robots. 
Drescher  (1991),  too.  uses  environmental  interaction  to 
construct  representation.  Still,  each  of  these  projects 
was  limited  by  the  relative  poverty  of  the  sensory  suite. 
In  this  project,  we  will  use  the  neuropsychological  evi¬ 
dence  to  build  a  far  more  sophisticated  instantiation  of 
the  body-based  theory  of  representation  and  to  examine 

There  are  exceptions  to  this:  for  instance,  the  work  of 
Beer  (1990):  but  that  is  restricted  to  insect  level  cognition. 

4One  particular  patient  (McCarthy  A  Warrington  1988) 
when  shown  a  pict  ure  of  a  dolphin,  was  able  to  form  sentences 
using  the  word  "dolphin"  and  talk  about  its  habitat,  its  ability 
to  be  trained,  and  its  role  in  the  US  military.  When  verbally 
asked  what  a  dolphin  was.  however,  he  thought  it  was  "either 
a  fish  or  a  bird."  He  had  no  such  discrepancies  in  knowledge 
when  the  subject  was.  for  example,  a  wheelbarrow. 

"For  instance,  some  patients  can  not  exercise  conscious 
control  over  their  fingers  for  simple  tasks,  yet  seem  unim¬ 
paired  in  threading  a  needle,  or  playing  the  piano.  Further¬ 
more  in  some  ca-  selective  drug  induced  suppression  shows 
wavs  in  which  many  simple  reflexes  combine  to  give  the  ap- 
,  -anrc  of  a  centralized  will  producing  g!  .ball  ,  cohrri  nt 
behavior  (Philip  Teitelbaum  A;  Peliis  1990) 


it  relative  to  traditional  theories  of  modularity 

There  is  also  evidence  that  what  appear  to  be  reason¬ 
ably  well  understood  sensory  channels  within  tin  brain 
are  much  more  complex  than  we  currently  image.  As 
one  example,  there  is  tin-  effect  known  as  hlindstyht . 
win-re  despite  tin-  lack  of  pieces  or  a  whole  visual  cor¬ 
tex.  both  humans  and  animals  can  perceive,  perhaps 
not  consciously,  certain  things  within  their  visual  field 
(Weiskrantz  198b).  (Braddick.  Atkison.  Hood.  Hark 
ness  A  an  Faraneh  \  arglia-Khadem  1992).  There  has 
been  some  recent  argument  that  these  phenomena  may 
In-  produced  bv  partially  intact  visual  cortex  (Feudrich. 
\Y  essinger  A  (lazzaniga  1992).  but  even  that  would  still 
call  into  question  the  arguments  of  Marr  (1982)  long 
used  in  computer  vision  that  the  purpose  of  the  vision 
system  is  to  reconstruct  a  d-dimensional  representation 
of  what  is  out  in  the  world. 

The  notion  that  embodiment  in  the  physical  world  is 
important  to  creating  human-like  intelligence  is  not  at  all 
new.  Even  the  1917  paper  of  Turing  ( 1970)  is  quite  con¬ 
cerned  about  this  point.  Later  Simon  (1909)  discussed 
a  similar  point  using  as  a  parable  an  ant  walking  along 
the  beach.  He  pointed  out  that  the  complexity  of  the 
behavior  of  the  ant  is  more  a  reflection  of  the  complex¬ 
ity  of  its  environment  than  its  own  internal  complexity 
and  speculated  that  the  same  may  be  true  of  humans. 

The  idea  that  our  very  modularity  and  internal  orga¬ 
nization  depends  on  our  ways  of  physically  interacting 
with  the  world  is  carried  even  further  in  series  of  philo¬ 
sophical  arguments  (Lakoff  A  Johnson  1980).  ( Lakoff 
1987),  (Johnson  1987).  Their  central  hypothesis  is  that 
all  of  our  thought  and  language  is  grounded  in  physical 
patterns  generated  in  our  sensory  and  and  motor  sys¬ 
tems  as  we  interact  with  the  world.  In  particular  these 
physical  bases  of  our  reason  and  intelligence  can  still 
be  discerned  in  our  language  as  we  confront'  the  fact 
that  much  of  our  language  can  be  "viewed"  as  physical 
metaphors,  "based"  on  our  own  bodily  interactions  with 
the  world. 

YVe  plan  on  taking  these  notions  seriously  as  we  build 
and  program  our  humanoids,  using  physical  interactions 
as  a  basis  for  higher  level  cognitive-like  behaviors.  \\'e 
have  already  demonstrated  a  simple  version  of  these 
ideas  using  currently  available  "insect-level"  robotics 
(Stein  to  appear). 

3.2  Symbols  and  Mental  Representation 

The  physical  symbol  sysli ni  hypothi sis  maintains  that 
any  physical  symbol  system  can  implement  intelligent 
behavior.  As  a  consequence,  it  says  that  symbols  provide 
a  layer  of  abstraction  that  hides  the  details  of  perceptual 
and  motor  processes. 

To  understand  the  difficulties  that  the  physical  sym¬ 
bol  system  hypothesis  presents  for  our  task,  we  might 
examine  another  similar  abstraction.  It  is  common  to 
regard  digital  design  as  concerned  solely  with  binary 
digits— discrete  ones  and  zeros.  Indeed,  this  digital  ab¬ 
straction  allows  the  use  of  boolean  logic  to  synthesize 
the  combinational  circuits  out  of  which  our  computa- 
t ionr.l  el.  men4  are  built  By  hiding  die  detail.-,  o.  utnhg 
voltages  that,  constitute  our  systems,  the  digital  abstrac- 


tion  facilitates  reasoning  about  and  construction  with 
these  elements.  However,  the  fact  that  the  digital  ab¬ 
straction  is  useful  for  combinational  synthesis  does  not 
mean  that  it  suffices  for  all  purposes.  Indeed,  for  certain 
elements  -  such  as  a  bipolar  switch  it  may  be  necessary 
to  look  beneath  the  digital  abstraction  to  understand  the 
interactions  of  electrical  components  e.g..  to  debounce 
the  switch.  Further,  certain  portions  of  the  resulting 
system-  such  as  the  debouncing  circuitry  may  » r  ro¬ 
be  interpretable  directly  in  terms  of  the  digital  abstrac¬ 
tion. 

Approaches  that  rely  on  the  physical  symbol-system 
hypothesis  cannot  constitute  complete  explanations  of 
intelligence,  precisely  because  they  abstract  away  the  de¬ 
tails  of  symbols'  implementat  ion.  In  order  for  a  brain-in- 
a-box  to  connect  to  a  body,  all  symbols  must  be  derivable 
front  sensory  stimuli:  but  in  addition,  there  are  portions 
of  the  system — such  as  the  bouncy  switch  -  that  Bil¬ 
liot  be  seen  from  the  symbolic  side  of  the  abstraction. 
Thus,  while  symbolic  approaches  to  cognition  may  pro¬ 
vide  us  with  tremendous  insight  as  to  how  intelligence 
might  work  once  we  have  symbols,  it  can  neither  tell  us 
how  to  construct  those  symbols  nor  assist  us  in  the  iden¬ 
tification  and  manipulation  of  the  non-symbolic  port  ion 
of  our  system. 

At  the  opposite  extreme  are  several  non-symbolic  ap¬ 
proaches  to  cognition.  From  connect ionism  to  reactive 
systems  to  artificial  life,  these  systems  operate  on  stimuli 
much  closer  to  “rear  sensory  input,  often  using  difficult- 
to-comprehend  processes  to  compute  appropriate  actions 
based  on  these  stimuli.  Because  they  are  closer  to  act  ual 
sensation,  these  approaches  have  had  marked  success  in 
certain  areas  (e.g.,  video-game  playing  (Agre  Jc  Chap¬ 
man  1987):  navigation  (Pomerleau  1991);  “insect”  intel¬ 
ligence  (Connell  1990),  (Angle  k  Brooks  1990)).  How¬ 
ever,  because  they  lack  symbols  or  any  comparable  ab¬ 
straction,  these  systems  are  often  inscrutable.  A  corol¬ 
lary  is  the  difficulty  that  practitioners  have  had  in  trans¬ 
ferring  knowledge  gained  in  the  construct  ion  of  one  sys¬ 
tem  to  the  design  of  the  next.  Because  there  is  little 
explicit  structure,  these  systems  generally  defy  descrip¬ 
tion  by  abstraction. 

We  believe  that  the  most  fruitful  approach  will  be  one 
that  builds  on  both  of  these  traditions  (e.g.,  Rosenschein 

Kaelbling  (1986),  Kuipers  k  Byuti  (1991).  Drescher 
( 1991).  Stein  (to  appear),  Yanco  Stein  ( 1993)).  Just  as 
the  digit  al  abstract  ion  is  useful  for  the  designer  of  combi¬ 
national  circuits,  so  the  symbolic  abstraction  will  be  in¬ 
valuable  for  t  he  designer  of  cognitive  components.  How- 
ever.  combinational  circuits  are  built  out  of  raw  voltages, 
not  out  of  ones  and  zeros:  the  binary  digits  are  in  the 
mind  of  the  designer.  Similarly,  the  symbolic  abstrac¬ 
tion  will  be  a  crucial  tool  in  the  analysis  and  synthesis 
of  our  humanoids;  but  we  do  not  necessarily  expect  these 
symbols  to  appear  explicitly  in  the  humanoid's  head. 

Thus,  both  of  these  pieces  will  inform  our  ap¬ 
proach  to  representation.  However,  it  is  not  at  all 
clear  that  a  single  “symbol”  (in  the  conventional  sense, 
e.g.,  'dolphin')  will  have  a  unitary  representation  (e.g.. 
in  the  human  brain  the  image  of  a  dolphin  may  be 
stored  separately  from  categorize1  knowledge  about  dol¬ 


phins  as  sea  creatures).  As  a  result,  we  will  need  to 
broaden  the  convent  tonal  definitions  We  expect  to  use 
lower  level  modules  demed,  e.g..  from  more  reactive 
approaches  to  come  up  with  appropriate  responses  to 
stimuli.  From  these,  we  will  identify  patterns  of  behav¬ 
ior  that  represent  generalizations  proto-symbols  and 
use  these  to  establish  reasoning  that  appears  to  be  more 
"symbolic” . 

There  is  an  argument  that  certain  components  of 
stimulus-response  systems  are  “symbolic.”  For  example, 
if  a  particular  neuron  files  or  a  particular  wire  carries 
a  positive  voltage  whenever  something  red  is  visible, 
that  neuron  or  wire  may  be  said  to  "represent"  the 
presence  of  something  red.  While  this  argument  may  be 
perfectly  reasonable  as  an  observer's  explanation  of  the 
system,  it  should  not  be  mistaken  for  an  explanation  of 
what  the  agent  in  question  believes.  In  particular,  the 
positive  voltage  on  the  wire  does  not  n  pit  mnl  the  pres¬ 
ence  of  red  to  tin  agent :  the  positive  voltage  is  the  pres¬ 
ence  of  something  red  as  far  as  the  robot  is  concerned. 

The  digital  abstraction  is  not  a  statement  about  how 
things  are;  it  is  merely  a  way  of  viewing  them.  A  com¬ 
binational  circuit  may  be  analyzed  in  terms  of  boolean 
logic,  but  it  is  voltages,  not  a  collection  of  ones  and  ze¬ 
ros.  (Or.  perhaps,  it  is  electrons  moving  in  a  particular 
way.)  At  best,  the  digital  abstraction  tells  us  that  the 
combinational  circuit  is  amenable  to  analysis  in  term  of 
ones  and  zeros:  but  it  does  not  change  the  reality  of  what 
is  there. 

Similarly,  the  utility  of  the  symbolic  abstraction  in  an¬ 
alyzing  rational  behavior  does  not  indicate  that  there  are 
actually  entities  corresponding  to  symbols  in  the  brain. 
Rather,  it  indicates  that  the  brain  -or.  more  likely, 
portions  of  the  brain  (viz.  the  debounced  switch)— are 
amenable  to  analysis  in  symbolic  terms.  It  does  not 
change  the  fact  that  everything  in  the  brain  is  (sub- 
symbolic)  neural  activity:  nor  does  the  equation  of  brain 
function  with  neural  activity  rule  out  the  utility  of  a 
symbolic  explanation. 

In  building  a  humanoid,  we  will  begin  at  this  sensory 
level.  All  intelligence  will  be  grounded  in  computation  on 
sensory  information  or  on  information  derived  from  sen¬ 
sation.  However,  some  of  this  computation  will  abstract 
away  from  explicit  sensation,  generalizing,  e.g.,  over  sim¬ 
ilar  situations  or  sensory  inputs.  Through  sensation  and 
action,  the  humanoid  will  experience  a  conceptualization 
of  space:  "up."  "down."  "near.”  "far.”  etc.  We  hypoth¬ 
esize  that  at  this  point  it  will  be  useful  for  observers  to 
describe  the  behavior  of  the  humanoid  in  symbolic  terms. 
(“It  put  the  red  blocks  together.’)  Tins  is  the  first  step 
in  representation. 

The  next  step  involves  a  jump  from  the  view  of  sym¬ 
bols  as  a  convenient  but  post  hoc  explanation  (i.e.,  for 
an  observer)  to  a  view  in  which  symbols,  somehow,  ap¬ 
pear  to  the  agent  to  exist  inside  the  agent's  head.  This 
second  step  is  facilitated  by  language,  one  of  the  tools 
that  allows  us  to  become  observers  of  ourselves.  This  is 
the  trick  of  consciousness:  the  idea  that  "we"  exist,  that 
one  part  of  us  is  observing  another. 

Although  there  is  good  evidence  that  consciousness  is 
anything  but  a  simple  phenomenon  (i.e..  that  the  reality 


is  Far  more  complex  Ilian  our  post  hoc  reconstruction  of 
it)  (Springer  A:  IVutsch  1981),  it  almost  certainly  does 
have  .town  of  the  properties  that  we  attrihute  to  it. 

With  language,  symbols  become  more  than  merely  a 
post  hoc  explanation  by  others  of  the  workings  of  our 
own  brains:  symbols  become  our  own  explanation  to  our¬ 
selves.  It  is  this  ability  to  distance  ottrsei\.*s  from  our 
own  symbols  that  gives  rise  to  our  illusions  of  conscious¬ 
ness  ( Bick hard  nal.).  How  can  we  produce  these  "sym¬ 
bolic"  associations?  The  same  processes  that  produce 
responses  from  sensory  inputs  can  be  stimulated  inter¬ 
nally.  For  example,  Kosslyn  (199:5)  has  demonstrated 
t  hat  port  ions  of  the  visual  cortex  are  implicated  in  visual 
imagery,  suggesting  precisely  this  sort  of  self-stimulation. 
Stein  (to  appear)  takes  a  similar  approach  to  add  cogni¬ 
tive  capacity  to  a  behavior-based  robot. 

We  can  summarize  out  approach  to  representation  as 
follows:  Stimulus-response  systems  abstract  away  from 
particular  inputs  to  treat  large  classes  of  inputs  simi¬ 
larly.  This  begins  the  "generalization”  of  particular  stim¬ 
uli  into  complex  reactions  and  the  external  appearance 
of  categorization,  or  proto-symbols.  Next ,  these  abst  ruc¬ 
tions  begin  to  be  produced  without  resorting  to  actual 
sensory  inputs.  Symbol-like  behavior  results,  but  with¬ 
out  instantiating  symbols  directly. 

4  High  Performance  Computing 

W  e  are  proposing  a  very  different  way  to  use  high  perfor¬ 
mance  computation  and  communication,  and  proposing 
to  use  it  in  a  domain  which  promises  to  become  a  major 
consumer  of  computation:  intelligent  embodied  agents 
that  interact  with  humans. 

While  traditional  parallel  processors  are  designed  to 
act  like  fast  serial  computers,  we  are  addressing  an  in¬ 
herently  parallel  task.  Indeed,  while  for  most  of  com¬ 
puter  science  the  translation  to  parallel  hardware  has 
imposed  additional  complexity  (and,  indeed,  much  cur¬ 
rent  research  is  devoted  to  minimizing  the  overhead  of 
this  translation),  we  anticipate  a  significant  simplifica¬ 
tion  of  our  task  in  virtue  of  the  parallel  hardware  avail¬ 
able. 

Much  of  the  work  ou  high  performance  computation 
is  benchmarked  in  terms  of  how  it  speeds  up  numerical 
simulations  of  physical  phenomena  (Cypher.  Ho.  Kon- 
stantinidou  Messina  1993).  In  these  domains  there  is 
a  well  defined  set  of  computations  that  given  a  valid  set 
of  init  ial  condit  ions  are  guaranteed  to  be  well  behaved  in 
some  sense,  generating  a  sufficiently  accurate  simulation 
of  how  events  will  unfold  over  time.  Data  is  collected 
along  the  way.  and  a  final  summary  of  how  the  modeled 
system  evolved  over  time  is  the  result  of  the  computa¬ 
tion.  The  model  of  a  computation  is  very  much  that  of  an 
algorithm  that  is  given  input  data  and,  after  some  suit¬ 
able  computation,  outputs  some  data.  As  a  result,  much 
of  the  research  into  high  performance  parallel  computers 
is  concerned  with  how  to  present  a  shared  memory  that 
can  be  accessed  quickly  by  all  processors,  leading  to  the 
need  for  local  caching  schemes  and  high  speed  switching 
networks;  how  to  make  sure  that  all  such  views  of  mem¬ 
ory  are  consistent,  leading  to  the  need  for  handling  cache 
coherence;  and  how  to  dynamically  balance  the  load  on 


all  processors,  given  the  implicit  understanding  that  tin- 
goal  of  tile  whole  job  is  to  complete  the  compulation  as 
quickly  as  possible 

In  our  "problem"  the  constraints  arc  very  different. 
By  the  nature  of  the  system  w<  do  not  need  to  migrate 
processes,  do  not  need  a  shared  memory,  and  do  nut 
need  to  dynamically  redirect  messages.  Simple  'hard 
wired"  messages  networks  should  sulfice.  with  memory 
only  local  to  each  processor.  I  lie  goal  is  not  to  "lintsli 
a  computation  a-,  quickly  as  possible  but  instead  to  pass 
the  data  through  a  process  in  a  bounded  amount  of  time 
so  that  the  next  data  that  the  world  presents  to  the  sys¬ 
tem  can  flow  through  without  getting  blocked  or  lost 
I'll  ere  is  no  end  to  a  computation  or  final  result:  all  is 
continuously  being  computed  and  recomputed,  and  ac¬ 
tions  in  the  world  are  the  "outputs"  of  tin'  system.  But 
the  computation  is  not  simply  linear  in  ordering.  There 
must  be  many  pathway*  between  sensors  and  actuators, 
some  with  very  different  latencies,  each  one  contributing 
to  some  aspect  of  the  resulting  behavior  of  the  system. 

We  need  high  performance  and  parallel  computing  in 
order  to  guarantee  the  bounds  on  computation  time  of 
any  particular  step  in  the  processes.  We  will  push  on  the 
organization  of  computation  to  do  useful  tasks  directly  in 
the  real  world,  and  will  be  pushing  in  a  direction  which 
should  lead  to  inherently  simpler-to-construct  massively 
parallel  computers.  The  applications  of  this  sort  of  pro¬ 
cessing  will  be  wide  ranging  and  indeed  may  well  become 
pervasive  throughout  our  society. 

Our  problem  is  more  one  of  maintenance  of  activity 
rather  than  achievement  of  a  single  solution  to  a  prob¬ 
lem. 

We  need  parallelism  because  of  the  vast  amounts  of 
processing  that  needs  to  be  done  in  order  to  make  sense 
of  a  continuous  and  rich  stream  of  perceptual  data.  We 
need  parallelism  to  coordinate  the  many  actuation  sys¬ 
tems  that  need  to  work  in  synchrony  (e.g..  the  ocular 
system  and  the  neck  must  move  in  a  coordinated  fashion 
at  time  to  maintain  image  stability)  and  which  need  to 
be  servoed  at  high  rates.  We  need  parallelism  in  order 
to  have  a  continuously  operating  system  that  can  be  up¬ 
graded  without  having  to  recompile,  reload,  and  restart 
all  of  the  software  that  runs  the  stable  lower  level  aspects 
of  the  humanoid.  And  finally  we  need  parallelism  for  the 
cognitive  aspects  of  the  system  as  we  are  attempting  to 
build  a  system  with  more  capability  than  can  fit  on  any 
existing  single  processor. 

But.  in  real-time  embedded  systems  there  is  another 
necessary  reason  for  parallelism.  It  is  the  fact  that  there 
are  many  things  to  be  attended  to  happening  in  the 
world  continuously,  independently  of  the  agent.  From 
this  comes  the  notion  of  an  agent  being  situated  in  the 
world.  Not  only  must  the  agent  devote  attention  to  per¬ 
haps  hundreds  of  different  sensors  many  times  per  sec¬ 
ond.  but  it  must  also  devote  attention  "down  stream"  in 
the  processing  chain  in  many  different  places  at  many 
times  per  second  as  the  processed  sensor  data  flows 
through  the  system.  The  actual  amounts  of  computation 
needed  to  he  done  by  each  of  t  hese  individual  processes 
is  in  fact  quite  small,  so  small  that  originally  we  formal¬ 
ized  them  as  augmented  finite  state  machines  (Brooks 


198(5),  although  more  recently  vve  have  thought  of  them 
as  real-time  rules  (Brooks  1990a).  They  are  too  small 
to  have  a  complete  processor  devoted  to  them  m  any 
machine  beyond  a  CM-2.  and  even  there  the  processors 
would  he  mostly  idle.  A  better  approach  is  to  simulate 
parallelism  in  a  single  conventional  processor  with  its 
own  local  memory. 

Our  humanoid  robot  will  be  situated  in  a  real  world 
over  which  it  has  very  little  control.  There  will  be  people 
present,  moving  about,  changing  the  physical  environs  of 
the  humanoid,  responding  to  actions  of  the  humanoid, 
and  generating  spontaneous  behaviors  themselves.  The 
task  for  the  humanoid  will  be  to  interact  with  these  ul¬ 
timately  unpredictable  agents  in  a  coherent  way  It  will 
get  a  continuous  large  and  rich  stream  of  input  data  of 
which  it  must  make  sense,  relat  ing  it  to  past  experiences 
and  future  possibilities  in  the  world.  It  will  be  a  partic¬ 
ipant  in  this  world  and  must  act  with  appropriate  speed 
and  grace. 

5  Hardware  and  Software  Experimental 
Platforms 

We  have  extensive  experience  in  building  mobile  robots. 
The  Pi's  have  been  directly  involved  in  the  design 
and  construction  of  over  3o  different  designs  for  mobile 
robots,  and  with  multiple  instances  of  many  of  these 
types  of  robots— over  100  robots  in  total. 

In  that  previous  work  with  mobile  robots,  we  started 
out  thinking  we  would  build  one  mobile  robot  that  would 
be  a  platform  for  research  for  a  generation  of  gradu¬ 
ate  students  (Brooks  198(5).  That  soon  changed  as  we 
realized  three  things:  (1)  trying  to  design  everything 
into  one  robot  caused  too  many  compromises  in  our  re¬ 
search  goals  as  early  experiments  soon  pointed  to  mul¬ 
tiple  different  sensor/actuator  suites  which  needed  to 
be  explored,  (2)  graduate  students  working  on  some¬ 
what  separate  thesis  projects  needed  their  own  robots  if 
they  were  to  do  extensive  multi-hundred  hours  of  opera¬ 
tion  experiments,  rather  than  simple  validation  demon¬ 
strations  in  controlled  environments  as  were  often  con¬ 
ducted  in  many  research  projects  (Brooks  19916)  and  (3) 
by  continually  re-engineering  our  designs  we  gradually 
built  more  robust  robots  with  longer  mean  times  between 
catastrophic  failures/1  Building  many  robots  over  a  short 
period  of  time  led  to  rapid  increases  in  performance  over 
a  diverse  set  of  robot  morphologies  (Brooks  ( 198(5),  Con¬ 
nell  (1987),  Horswill  Ac  Brooks  (1988),  Brooks  (1989), 
Connell  ( 1990),  Angle  A  Brooks  ( 1990),  Mataric  ( 19926). 
Mataric  (1992a),  Ferrell  (1993),  Horswill  (1993);  see 
Brooks  (19906)  for  an  overview).  At  the  same  time, 
a  common  software  system  (Brooks  1990a)  was  devel¬ 
oped  which  ran  on  many  different  processors,  but  pro¬ 
vided  a  common  environment  for  programming  all  the 

''This  observation  parallels  the  developments  in  digital 
computers,  where  mean  time  between  failures  in  the  1950's 
was  in  the  20  minute  range,  extending  to  periods  of  a  week 
in  the  1 97'0's,  and  now  typically  we  are  not  surprised  when 
our  workstations  run  for  months  without  needing  to  be 
rebooted — this  increase  in  robustness  was  bought  with  many 
hundreds  of  iterations  of  the  engineering  cycle. 


diverse  robots  Brooks  (1991)6)  gives  a  mid-course  re¬ 
view  of  some  of  those  robots 

111  tills  project  loo.  We  expect  that  there  will  be  great 
benefits  from  building  the-  humanoid  repeatedly  oxer  t  lie 
life  of  t  In-  project  and  from  running  t  lie  software  on  mul¬ 
tiple  computer  architectures,  taking  advantage  m  both 
cases  of  technological  developments  t  hat  will  occur  inde¬ 
pendently  of  this  project.  At  the  same  time  We  will  be 
following  a  learning  curve,  increasing  our  engineering  so¬ 
phistication  and  the  inherent  robustness  of  I  lie  systems 
We  build 

To  this  end  we  have  already  started  building  the  zero¬ 
th  version  of  the  humanoid  over  the  summer  of  1993, 
relying  on  current  supplies  in  stock  and  largely  off  tie- 
shell' components  which  are  being  purchased  with  mod¬ 
est  amounts  of  unrestricted  funds  from  previous  dona¬ 
tions.  At  the  same  time  a  more  extensive  software  de¬ 
velopment  effort  is  under  way.  We  expect  the  zero-th 
generation  hardware  to  disappear  within  a  few  months, 
but  the  software  will  form  the  kernel  of  future  systems. 

5.1  Brains 

Our  goal  is  to  take  advantage  of  the  new  availability  of 
massively  parallel  computation  in  dedicated  machines. 
W  e  need  parallelism  because  of  the  vast  amounts  of  pro¬ 
cessing  that  must  be  done  in  order  to  make  sense  of  a 
continuous  and  rich  stream  of  perceptual  data.  We  need 
parallelism  to  coordinate  the  many  actuation  systems 
that  need  to  work  in  synchrony  (e.g.,  the  ocular  system 
and  t  he  neck  must  move  in  a  coordinated  fashion  at  t  ime 
to  maintain  image  stability)  and  which  need  to  be  ser- 
voed  at  high  rates.  We  need  parallelism  in  order  to  have 
a  continuously  operating  system  that  can  be  upgraded 
without  having  to  recompile,  reload,  and  restart  all  of 
the  software  that  runs  the  stable  lower  level  aspects  of 
the  humanoid.  And  finally  we  need  parallelism  for  the 
cognitive  aspects  of  the  system  as  we  are  attempting  to 
build  a  "brain"  with  more  capability  than  can  fit  on  any- 
existing  single  processor. 

But  in  real-time  embedded  systems  there  is  yet  an¬ 
other  necessary  reason  for  parallelism.  It  is  the  fact 
that  there  are  many  things  to  be  attended  to.  happen¬ 
ing  in  the  world  continuously,  independent  of  the  agent. 
From  this  comes  the  notion  of  an  agent  being  situated 
in  the  world.  Not  only  must  the  agent  devote  atten¬ 
tion  to  perhaps  hundreds  of  different  sensors  many  times 
per  second,  but  it  must  also  devote  attention  "down 
stream"  in  the  processing  chain  in  many  different  places 
at  many  times  per  second  as  t  he  processed  sensor  data 
flows  through  the  system.  The  actual  amounts  of  com¬ 
putation  needed  to  be  done  by  each  of  these  individual 
processes  is  in  fact  quite  small,  so  small  that  originally 
wo  formalized  them  as  augmented  finite  state  machines 
(Brooks  198(5).  although  more  recently  we  have  thought 
of  them  as  real-time  rules  (Brooks  1990a).  'They  are  too 
small  to  have  a  complete  processor  devoted  to  them  in 
any  machine  beyond  a  CM-2.  and  even  there  the  pro¬ 
cessors  would  be  mostly  idle.  A  better  approach  is  to 
simulate  parallelism  in  a  single  conventional  processor 
wit  h  its  own  local  memory. 

For  instance.  Ferrell  (1993)  built  a  software  system 


to  control  a  lit  actuator  six  logged  robot  using  about  (itl 
of  its  sensors.  Site  implemented  it  as  more  than  15(H) 
parallel  processes  running  on  a  single  Phillips  (>N)70  (It 
communicated  with  7  peripheral  processors  which  han¬ 
dled  sensor  data  collection  and  lOOflz  motor  servoing.) 
Most  of  these  parallel  processes  ran  at  rates  varying  be¬ 
tween  10  and  25  Hertz.  Kach  time  each  process  ran.  it 
took  at  most  a  few  dozen  instructions  before  blocking, 
waiting  either  for  the  passage  of  time  or  for  some  other 
process  to  send  it  a  message,  (  leaily.  low  cost  context 
switching  was  important. 

The  underlying  computational  model  used  on  (hat 
robot  and  with  many  tens  of  other  autonomous  mobile 
robots  we  have  built  --consisted  of  networks  of  message¬ 
passing  augmented  finite  state  machines.  Kach  of  these 
AFSMs  was  a  separate  process.  The  messages  were  sent 
over  predefined  'wires'  from  a  specific  transmitting  to 
a  specific  receiving  AFSM.  The  messages  were  simple 
numbers  (typically  8  bits)  whose  meaning  depended  on 
the  designs  of  both  the  transmitter  and  the  receiver.  An 
AFSM  had  additional  registers  which  held  the  most  re¬ 
cent  incoming  message  on  any  particular  wire.  This  gives 
a  very  simple  model  of  parallelism,  even  simpler  than 
that  ofCSP  (Hoare  1985).  The  registers  could  have  their 
values  fed  into  a  local  combinatorial  circuit  to  produce 
new  values  for  registers  or  to  provide  an  output  mes¬ 
sage.  The  network  of  AFSMs  was  totally  asynchronous, 
but  individual  AFSMs  could  have  fixed  duration  monos¬ 
tables  which  provided  for  dealing  with  the  flow  of  time 
in  the  outside  world.  The  behavioral  competence  of  the 
system  was  improved  by  adding  more  behavior-specific 
network  to  the  existing  network.  This  process  was  called 
layering.  This  was  a  simplistic  and  crude  analogy  to 
evolutionary  development.  As  with  evolution,  at  every 
stage  of  the  development  the  systems  were  tested.  Kacli 
of  the  layers  was  a  behavior-producing  piece  of  network 
in  its  own  right,  although  it  might  implicitly  rely  on  the 
presence  of  earlier  pieces  of  network.  For  instance,  an 
explore  layer  did  not  need  to  explicitly  avoid  obstacles, 
as  the  designer  knew  that  a  previous  avoid  layer  would 
take  care  of  it.  A  fixed  priority  arbitration  scheme  was 
used  to  handle  conflicts. 

On  top  of  the  AFSM  substrate  we  used  another 
abstraction  known  as  the  Behavior  Language,  or  BL 
(Brooks  1990a).  which  was  much  easier  for  the  user 
to  program  with.  The  output  of  the  BL  compiler  was 
a  standard  set  of  augmented  finite  state  machines;  by 
maintaining  this  compatibility  all  existing  software  could 
be  retained.  When  programming  in  BL  the  user  has  com¬ 
plete  access  to  full  Common  Lisp  as  a  meta-language  by¬ 
way  of  a  macro  mechanism.  Thus  the  user  could  eas¬ 
ily  develop  abstractions  on  top  of  BL,  while  still  writ¬ 
ing  programs  which  compiled  down  to  networks  of  AF¬ 
SMs.  In  a  sense,  AFSMs  played  the  role  of  assembly- 
language  in  normal  high  level  computer  languages.  But 
the  struct  ure  of  the  AFSM  networks  enforced  a  program¬ 
ming  style  which  naturally  compiled  into  very  efficient 
small  processes.  The  structure  of  the  Behavior  Language 
enforced  a  modularity-  where  data  sharing  was  restricted 
to  smallish  sets  of  AFSMs,  and  whose  only  interfaces 
were  essentially  asynchronous  1-deep  buffers. 


In  the  humanoid  project  we  believe  much  of  the  com 
putation.  especially  for  the  lower  levels  of  the  system, 
will  naturally  be  of  a  similar  nature  We  expect  to 
perform  different  experiments  where  m  some  case-,  the 
higher  level  computations  are  o|  the  same  nature  and  m 
other  cases  the  higher  levels,  will  lie  milch  more  symbolic 
in  nature,  although  the  symbolic  bindings  will  be  re¬ 
stricted  to  within  individual  processors  We  need  to  us< 
software  and  hardware  env  iromnenis  which  give  support 
to  these  requirements  without  sacrificing  the  high  h  \e|s 
of  performance  of  wliii  h  we  wish  to  make  use 

5.1.1  Software 

For  the  software  environment  we  have  a  number  of 
requirements: 

•  There  should  be  a  good  software  development  en¬ 
vironment 

•  The  system  should  be  completely  portable  over 
many  hardware  environments,  so  that  we  can  up¬ 
grade  to  new  parallel  machines  over  the  lifetime  of 
tills  project. 

•  The  system  should  provide  efficient  code  for  per¬ 
ceptual  processing  such  as  vision. 

•  The  system  should  let  us  write  high  level  symbolic 
programs  when  desired. 

•  The  system  language  should  be  a  standardized  lan¬ 
guage  that  is  widely  known  and  understood. 

In  summary  our  software  environment  should  let  us  gain 
easy  access  to  high  performance  parallel  computation. 

We  have  chosen  to  use  ( 'opinion  Lisp  (Steele  ,)r.  1991)) 
as  the  substrate  for  all  software  development.  This 
gives  us  good  programming  environments  including  type 
checked  debugging,  rapid  prototyping,  symbolic  compu¬ 
tation.  easy  ways  of  writing  embedded  language  abstrac¬ 
tions.  and  automatic  storage  management.  We  believe 
that  Common  Lisp  is  superior  to  ('  (the  other  major 
contender)  in  all  of  these  aspects. 

The  problem  then  is  how  to  use  Lisp  in  a  massively 
parallel  machine  where  each  node  may  not  have  the  vast 
amounts  of  memory  that  we  have  become  accustomed 
to  feeding  Common  Lisp  implementations  on  standard 
l  nix  boxes. 

We  have  a  long  history  of  building  high  performance 
Lisp  compilers  (Brooks.  Cabriel  ,k-  Steele  .Jr.  1982).  in¬ 
cluding  one  of  the  two  most  common  commercial  Lisp 
compilers  on  the  market;  Lucid  Lisp  -  Brooks.  Hosie  r. 
McDonald.  White.  Benson  k  Cabriel  (198(1). 

Recently  we  have  developed  L  (Brooks  1995),  a  r<- 
targetable  small  efficient  Lisp  which  is  a  downwardly 
compatible  subset  of  Coninion  Lisp.  When  compiled 
for  a  (>8000  based  machine  the  load  image  (without  the 
compiler)  is  only  14IIK  bytes,  but  includes  multiple  val¬ 
ues.  strings,  characters,  arrays,  a  simplified  but  com¬ 
patible  package  system,  all  the  "ordinary  "  aspects  of 
format,  backquote  and  comma,  setf  etc.,  full  Common 
Lisp  lambda  list s  including  optionals  and  keyword  argu¬ 
ments,  macros,  an  inspector,  a  debugger,  def struct  (in¬ 
tegrated  with  the  inspector),  block,  catch,  and  throw. 


etc.,  full  dynamic  closure-,,  a  lull  lexical  interpreter.  Hom¬ 
ing  point,  last  garliage  collection,  and  so  on  Tin-  com¬ 
piler  runs  in  t  itne  linear  in  t  lie  size  of  an  input  expression 
except  in  the  presence  of  lexical  closures,  fl  neverthe¬ 
less  produces  highly  optimized  code  in  most  cases  I,  is 
missing  flet  and  labels,  generic  arithmetic,  bignnms. 
rationals,  complex  numbers,  the  library  of  sequence  func¬ 
tions  (which  can  be  written  within  I.)  and  esoteric  parts 
of  format  and  packages. 

The  l.  system  is  an  intellectual  descendenl  of  the  dy¬ 
namically  retargetable  Lucid  Lisp  compiler  (  Brooks  et  al 
1986)  and  the  dynamically  retargetable  Behavior  Lan¬ 
guage  compiler  (Brooks  1990<r).  The  systf  m  is  totally 
written  in  L  with  machine  dependent  backends  for  re¬ 
targeting.  The  lirst  backend  is  for  the  Motorola  68020 
(and  upwards)  family,  but  it  is  easily  retargeted  to  new 
architectures.  The  process  consists  of  writing  a  simple 
machine  description,  providing  code  templates  for  about 
100  primitive  procedures  (e.g..  fixed  precision  integer  +. 
*.  =,  etc.,  string  indexing  CHAR  and  other  accessors.  CAR. 
CDR.  etc.),  code  macro  expansion  for  about  20  pseudo 
instructions  (e  g.  procedure  call,  procedure  exit,  check¬ 
ing  correct  number  of  arguments,  linking  CATCH  frames, 
etc.)  and  two  corresponding  sets  of  assembler  routines 
which  are  too  big  to  be  expanded  as  code  templates  ev¬ 
ery  time,  but  are  so  critical  in  speed  that  they  need  to  be 
written  in  machine  language,  without  the  overhead  of  a 
procedure  call,  rather  than  in  Lisp  (e.g.,  CONS,  spreading 
of  multiple  values  on  the  stack,  etc.).  There  is  a  version 
of  the  I/O  system  which  operates  by  calling  ('  routines 
(e.g..  fgetchar,  etc.;  this  is  how  the  Macintosh  version 
of  L  runs)  so  it  is  rather  simple  to  port  the  system  to  any 
hardware  platform  we  might  choose  to  use  in  the  future. 

Note  carefully  the  intention  here:  L  is  to  be  the  de¬ 
livery  vehicle  running  on  the  brain  hardware  of  the  hu¬ 
manoid.  potentially  on  hundreds  or  t  housands  of  small 
processors.  Since  it  is  fully  downward  compatible  with 
Common  Lisp  however,  we  can  carry  out  code  develop¬ 
ment  and  debugging  on  standard  work  stations  with  full 
programming  environments  (e.g..  in  Macintosh  Common 
Lisp,  or  Lucid  Common  Lisp  with  Emacs  19  on  a  l  nix 
box,  or  in  the  Harlequin  programming  environment  on  a 
Unix  box).  We  can  then  dynamically  link  code  into  the 
running  system  on  our  parallel  processors. 

There  are  two  remaining  problems:  (1)  how  to  main¬ 
tain  super  critical  real-time  performance  when  using  a 
Lisp  system  without,  hard  ephemeral  garbage  collection, 
and  (2)  how  to  get  the  level  of  within-processor  paral¬ 
lelism  described  earlier. 

The  structure  of  L's  implementation  is  such  that  mul¬ 
tiple  independent  heaps  can  be  maintained  within  a  sin¬ 
gle  address  space,  sharing  all  the  code  and  data  segments 
of  the  Lisp  proper.  In  this  way  super-critical  portions  of 
a  system  can  be  placed  in  a  heap  where  no  consing  is  oc¬ 
curring,  and  hence  there  is  no  possibility  that  they  will 
be  blocked  by  garbage  collection. 

The  Behavior  Language  (Brooks  1990a)  is  an  exam¬ 
ple  of  a  compiler  which  builds  special  purpose  static 
schedulers  for  low  overhead  parallelism.  Each  process 
ran  until  blocked  and  the  syntax  of  the  language  forced 
there  to  always  be  a  blocking  condition,  so  there  was  no 


need  for  pre-emptive  scheduling  Additionally  the  syn¬ 
tax  and  semantics  of  the  languag<  guaranteed  that  tin  r, 
would  be  zero  Stack  context  net  lied  to  be  saved  when  a 
blocking  condition  was  rendu  d  We  will  need  to  build 
a  new  scheduling  system  with  I.  to  address  similar  is 
sues  III  tills  project.  To  fit  ill  With  the  philosophy  of  t  In 
rest  ol  the  sv'iem  ii  must  be  dynamic  scheduler  so 
that  new  processes  can  In1  added  alid  deleted  as  a  user 
types  to  the  Lisp  listener  of  a  particular  processor  Rea¬ 
sonably  straightforward  data  structures  can  keep  these 
costs  to  manageable  levels  It  is  rather  straightforward 
to  build  a  phase  into  tin-  I.  compiler  which  can  recognize 
the  situations  described  above  Thus  it  is  straightfor¬ 
ward  to  implement  a  set  of  macros  which  will  prov  ide  a 
language  abstraction  on  top  of  Lisp  which  will  provide 
all  the  functionality  of  t he  Behav  ior  Language  and  which 
will  additionally  let  us  have  dynamic  scheduling.  Almost 
certainly  a  pre-emptive  srhedulei  will  be  needed  in  ad¬ 
dition.  as  it  would  be  difficult  to  enforce  a  compu!  at  ion 
time  limit  syntactically  when  <  'ominoii  Lisp  will  essen¬ 
tially  be  available  to  the  programmer  at  the  very  least 
the  case  of  the  pre-emptive  scheduler  having  to  strike 
down  a  process  will  he  useful  as  a  safety  device,  and 
will  also  act  as  a  debugging  tool  for  the  user  to  iden¬ 
tify  time  critical  computations  which  are  stressing  t In- 
bounded  computation  style  of  writing  In  other  cases 
static  analysis  will  be  able  to  determine  maximum  stack 
requirements  for  a  particular  process,  and  so  heap  allo¬ 
cated  stacks  will  be  usable' 

The  software  system  so  far  described  will  be  Used  to 
implement  crude  forms  of  brain  models',  where  compu¬ 
tations  will  be  organized  in  ways  inspired  by  the  sorts  of 
anatomical  divisions  we  see  occurring  in  animal  brains. 
Note  that  we  are  not  saying  we  will  build  a  model  of  a 
particular  brain,  but  rat  In  r  tl  ere  will  be  a  modularity  in¬ 
spired  by  such  components  as  visual  cortex,  auditory  cor¬ 
tex.  etc.,  and  within  and  across  those  components  there 
will  he  further  modularity,  e.g..  a  particular  subsystem 
to  implement  the  vest ibulo-ocnlar  response  (YOR) 

T  hus  besides  on-processor  parallelism  we  will  need  to 
provide  a  modularity  tool  that  packages  processes  into 
groups  and  limits  data  sharing  between  them.  Each 
package  will  reside  on  a  single  processor,  but  often 
processors  will  host  many  such  packages.  A  package 
that  communicates  with  another  package-  should  be  in¬ 
sulated  at  the  syntax  level  from  knowing  whether  tin- 
other  package1  is  on  the  same  or  a  different  proces¬ 
sor.  The  communication  medium  between  such  package- 
will  again  be  1-deep  buffers  without  queuing  or  receipt 
acknowledgment  any  such  acknowledgment  will  need  t < > 
be  implemented  as  a  backward  channel,  much  as  we  so- 
throughout  the  cortex  (( Tiurchland  Ac  Sejnowski  1992) 
This  packaging  system  can  be  implemented  in  ( 'oiiimon 
Lisp  as  a  macro  package. 

We  expect  all  such  system  level  software  development 
to  be  completed  ill  the  first  twelve  months  of  the  project 


The  problem  with  heap  allocated  stacks  in  the  general 
rase  is  that  there  will  be  no  overflow  protection  into  the  rest 
of  heap. 


* 


5.1.2  Computational  Hardware 

The  computational  model  presented  in  tie  previ¬ 
ous  section  is  somewhat  different  from  that  usually  as¬ 
sumed  in  high  performance  parallel  computer  applica¬ 
tions  Typically  (Cypher  et  al  1*813)  there  is  a  strung 
bias  on  system  reipiirements  from  t  lie  sort  of  hen  hmarks 
that  are  used  to  evaluate  performance.  The  standard 
benchmarks  for  modern  high  performance  computation 
seem  to  be  Fortran  codes  for  hydrodynamics,  molecu¬ 
lar  simulations,  or  graphics  rendering,.  We  are  propos¬ 
ing  a  very  different  application  with  very  different  re¬ 
ipiirements:  in  particular  we  require  real-time  response 
to  a  wide  variety  of  external  and  internal  events,  we  re¬ 
quire  good  symbolic  computation  performance,  we  re¬ 
quire  only  integer  rather  than  high  performance  float¬ 
ing  point  operations.''  we  require  delivery  of  messages 
only  to  specific  sites  determined  at  program  design  tune, 
rather  than  at  run-time,  and  vvi  require  the  ability  to  do 
very  last  context  switches  because  of  the  large  number 
of  parallel  processes  that  we  intend  to  run  on  each  indi¬ 
vidual  processor. 

The  fact  that  we  will  not  need  to  support  pointer  ref¬ 
erences  across  the  computational  substrate  v  ill  mean 
that  we  can  rely  on  much  simpler,  nut  therefore 
higher  performance,  parallel  computers  than  many  other 
researchers  we  will  not  have  to  worry  about  a  consis¬ 
tent  global  memory,  cache  coherence,  or  arbitrary  mes¬ 
sage  routing.  Since  these  are  different  requirements  th  m 
those  that  are  normally  considered,  we  have  to  make 
some  measurements  with  actual  programs  before  we  ran 
we  can  make  an  intelligent  off  the  shelf  choice  of  com¬ 
puter  hardware. 

In  order  to  answer  some  of  these  questions  we  are  cur¬ 
rently  building  a  zero-th  generation  parallel  computer.  It 
is  being  built  on  a  very  low  budget  wit  h  off  the  shelf  com¬ 
ponents  wherever  possible  (a  few  fairly  simple  printed 
circuit  boards  need  to  be  fabricated).  The  processors 
are  l(5Mhz  Motorola  68332s  on  a  standard  hoard  built 
by  Vesta  'Technology.  These  plug  1(>  to  a  backplane. 
The  backplane  provides  each  processor  with  six  commu¬ 
nications  ports  (using  the  integrated  timing  processor 
unit  to  generate  the  required  signals  along  with  spe¬ 
cial  chip  select  and  standard  address  and  data  lines) 
and  a  peripheral  processor  port.  'The  communications 
ports  will  be  hand-wired  with  patch  cables,  building  a 
fixed  topology  network.  (The  cables  incorporate  »  single 
dual  ported  RAM  (8K  by  16  bits)  that  itself  includes 
hardware  semaphores  writable  and  readable  by  the  two 
processors  being  connected.)  Background  processes  run¬ 
ning  on  the  (58332  operating  system  provide  sustained 
rate  transfers  of  60Hz  packets  of  IK  bytes  or,  cli  port, 
with  higher  peak  rates  if  desired.  These  sustained  rates 
do  consume  processing  cycles  from  the  (58332.  On  non- 
vision  processors  we  expect  much  lower  rates  will  be 
needed,  and  even  on  vision  processors  we  can  proba¬ 
bly  reduce  the  packet  frequency  to  around  lallz.  Kadi 

''Consider  the  dynamic  range  possible  in  single  signal 
channels  in  the  human  brain  and  it  soon  becomes  apparent 
that  all  that  we  wish  to  do  is  certainly  achievable  with  nei¬ 
ther  span  of  600  orders  of  magnitude,  or  47  significant  binary 
digits. 


processor  has  an  operating  system.  I..  and  tin-  dynamic 
scheduler  residing  m  1  \1  of  FM’ROVI  There  is  1M  of 
RAM  for  pi  •ograni.  slack  and  heap  space  I  p  to 
processors  can  b,  connected  together. 

I  p  to  ID  backplanes  can  be  eon  in  cird  to  a  single  it  >m 

<  ud  processor  (1  1  1’)  via  a  shared  GllPh  baud  snial  line 
to  a  S<  'SI  emulator.  A  large  network  of  iis33'2s  ran  span 
many  I  Id's  it  We  choose  to  extend  ill'  eollsl  met  loll  of 
lliis  zero-ill  prototype.  I  iilially  w,  will  use  a  Macintosh 
as  a  I  LI'  Software  written  m  Macintosh  Common  l  isp 
on  t he  I  Id*  will  provide  disk  I/O  services  to  the  68332  s. 
monitor  status  and  health  packets  from  them,  and  p,o~ 
vidr  the  user  with  i  l.isp  listener  to  any  processor  they 
might  boose. 

1  lie  zero-th  version  uses  the  standard  Motorola  SIM 
(serial  peripheral  interface)  to  communicate  with  up  to 
Hi  M  otorola  08i  I  p  icessors  per  08332 .  I  liese  are  a  sin 
gle  chip  processor  with  onboard  idd’RO.M  (2l\  bytes) 
and  RAM  (2->li  bytes),  including  a  timer  system,  an  S)M 
iu'erlaee.  and  8  channels  of  analog  to  digital  conversion. 
Me  are  building  a  small  custom  board  for  this  pro <■< 
sur  that  includes  opto-isolated  motor  drivers  and  some 
standard  analog  support  for  sensors 

Me  expect  oar  tii  t  backplane  to  be  operational  by 
August  1st.  118)3  so  that  vw  can  commence  experiments 
with  mir  first  prototype  body.  We  will  collect  statistics 
on  inter-processor  commuuieat ion  throughput,  effects  of 
latency,  and  other  measures  so  that  we  can  better  choose 
a  larger  scale  parallel  processor  for  iiiop  serious  versions 
of  t lie  humanoid. 

In  the  mean1  line,  however,  there  are  certain  develop¬ 
ments  on  the  horizon  within  the  \JJ  J  Artificial  Intel¬ 
ligence  Lab  which  we  expect  to  capitalize  upon  in  or¬ 
der  to  dramatically  upgrade  our  computational  systems 
for  early  vision,  and  hence  the  resolution  al  which  we 
can  afford  to  process  images  in  real  time.  The  first  of 
these,  expected  in  the  fall  will  be  a  somewhat  similar 
distributed  processing  system  based  on  the  much  higher 
performance  ’Texas  Instrument  ('10.  which  comes  with 
built  in  support  for  fixed  topology  message  passing.  Me 
expect  these  systems  to  be  available  in  i  he  Fall  03  time- 
frame.  In  October  til  we  expect  to  be  able  to  make  use 
of  the  Abacus  system,  a  bit  level  recotifigurable  vision 
front-end  processor  being  built  under  AHI’A  sponsorship 
which  promises  Tera-op  performance  >,n  10  bit  fixed  pre¬ 
cision  operands.  Both  these  systems  will  be  simply  inlc- 
grable  with  our  zero-th  order  parallel  processor  via  the 
standard  dual-ported  RAM  protocol  that  we  are  using. 

5.2  Bodies 

As  with  i.e  computational  hardware,  we  are  also  cur¬ 
rently  engaged  in  building  a  zero-1  li  generation  body 
for  early  experimentation  and  design  refiiicmeiii  towards 
more  serious  constructions  wit  bin  the  scope  of  this  pr< - 

Me  currently  have  2S  operational  robots  in  our  lab,,  each 
with  between  t  anil  ■"  of  these  fix]  1  processors,  anil  several 
dozen  other  robots  with  at  least  I  such  processor  on  board. 
Me  have  great  experience  in  writing  compiler  backends  for 
these  processors  (including  Bid  anil  great  experience  in  us¬ 
ing  them  for  all  sorts  of  s.-rvoing.  sensor  monitoring,  and 
communications  tasks. 


posal.  YVe  ar<-  presently  limited  l»y  budgetary  const  raitits 
to  building  an  immobile.  armless,  deal',  torso  with  only 
Mack  and  white  vision. 

In  t lie  following  stthsect ioiis  vve  out lim-  the  const  ran n s 
and  requirements  on  a  lid!  scale  humanoid  body  and  also 
include  where  relevant  details  of  our  zero-ill  level  proto¬ 
type. 

5.2.1  Eyes 

1  here  has  lieen  quite  a  lot  ol  recent  work  on  (uiiinuh 
rt.smn  u sing  saceading  stereo  cameras,  most  notably  at 
Rochester  (Hnllard  1  f  >S!  • ) .  (Coombs  1992).  I  >ut  also  more 
recently  at  many  o' her  institutions,  such  as  Oxford  I  di¬ 
versity 

The  humanoid  needs  a  head  with  high  mechanical 
performance  eyeballs  and  foveated  vision  if  it  is  lo  l.e 
aide  lo  participate  in  the  world  with  people  in  a  natu¬ 
ral  way.  Even  our  earliest  heads  will  include  two  eyes, 
with  foveated  vision  aide  to  pan  and  till  as  a  unit,  and 
with  independent  saceading  ability  (three  sacrades  per 
second)  and  vergeiice  control  of  the  eyes.  Emidameu- 
tal  vision  based  behaviors  will  include  a  visually  cali¬ 
brated  vestibular-ocular  rellex.  smooth  pursuit,  visually 
calibrated  saccades.  and  object  centered  fovea  I  relative 
depth  stereo.  Independent  visual  systems  will  provide 
peripheral  and  foveal  motion  cues,  color  disrnminat ion. 
human  face  pop-outs,  and  eventually  face  recognition. 
Over  the  course  of  the  project,  object  recognition  based 
based  on  "representations"  from  body  schemas  and  ma¬ 
nipulation  interact  ions  will  be  developed.  This  is  com¬ 
pletely  different  from  any  conventional  object  recogni¬ 
tion  schemes,  and  can  not  be  attempted  without  an  in¬ 
tegrated  vision  and  manipulation  environment  as  we  pro¬ 
pose. 

The  eyeballs  need  to  be  able  to  saccade  up  to  about 
three  times  per  second,  stabilizing  for  25()ms  at  each 
stop.  Additionally  the  yaw  axes  should  be  controllable 
for  vergence  to  a  common  point  and  drivable  in  a  man- 
iTer  appropriate  for  smooth  pursuit  anti  for  image  stabi¬ 
lization  a.s  part  of  a  vestibiiloorular  response  (VOK)  to 
head  movement .  The  eyeballs  do  not  need  to  be  force 
or  torque  controlled  but  they  do  need  good  fast  position 
and  velocity  control.  YVe  have  previously  built  a  sin¬ 
gle  eyeball.  A-eyt,  on  which  we  implemented  a  model  of 
VOR.  ocular-kinetic  response  (OKR)  and  saccades.  all 
of  which  used  dynamic  visually  based  calibration  (Viola 
1990). 

Other  active  vision  systems  have  had  both  eyeballs 
mounted  on  a  single  tilt  axis.  YVe  will  begin  experiments 
with  separate  tilt  axes  but  if  we  find  that  relative  tilt 
motion  is  not  very  useful  we  will  bark  ofr  from  this  re¬ 
quirement  in  later  versions  of  the  head. 

The  cameras  need  to  cover  a  wide  field  of  view,  prefer¬ 
ably  close  to  180  degrees,  while  also  giving  a  foveated 
central  region.  Ideally  the  images  should  he  HOB  (rather 
than  the  very  poor  color  signal  of  standard  NTSC).  A 
resolution  of  512  by  512  at  both  t  he  coarse  and  fine  scale 
is  desirable. 

Our  zero-til  version  of  tile  cameras  are  black  and  white 
only.  Each  eyeball  consists  of  two  small  lightweight  cam¬ 
eras  mounted  with  parallel  axes.  One  gives  a  1 15  degree 


field  id"  View  and  the  other  give*  a  i'D  degree  foveated 
region.  In  order  to  handle  tin  image*  m  i  a]  time  in  our 
zero- tli  parallel  proeessor  w«  will  subsail  de  the  image* 
lo  be  min  li  smaller  than  l  lie  ideal 

Eater  version*  of  tin  head  will  have  full  HOB  color 
cameras,  wider  angles  for  the  peripheral  vision,  much 
liner  grain  sampling  ol  tin  images,  and  perhaps  a  col- 
inear  optics  set  up  using  optical  liber  cable*  and  beam 
splitters.  YY  it li  more  sophisticated  high  speed  process¬ 
ing  available  We  will  also  lie  able  to  do  experiment  s  with 
log-polar  image  representations 

5.2.2  Ears.  Voice 

Almost  no  work  has  been  done  on  sound  understand¬ 
ing.  as  distinct  from  speech  understanding  I  his  project 
will  start  on  sound  understanding  to  provide  a  much 
more  solid  processing  base  for  later  work  on  speech  in¬ 
put.  Early  behavior  layers  will  spatially  correlate  noises 
with  visual  events,  and  spatial  registration  will  he  con¬ 
tinuously  self  calibrating.  Efforts  will  concentrate  on  us¬ 
ing  this  physical  cross-correlation  as  a  basis  for  reliably 
pulling  out  interesting  event*  from  background  noise, 
and  mimicking  the  cocktail  party  effect  of  hemg  alf' 
to  focus  attention  on  particular  sound  sources.  V  isual 
correlation  with  face  pop-outs,  etc  .  will  then  be  used 
to  be  able  to  extract  human  sound  streams.  Work  will 
proceed  on  using  t  luse  sounds  si  reams  to  mimic  infant  '* 
abilities  to  ignore  language  dependent  irrelevances.  By 
the  time  we  get  to  elementary  Speech  we  will  therefore 
have  a  system  able  to  work  in  noisy  environments  and 
accustomed  to  multiple  speakers  with  varying  accents. 

Sound  percept.cn  will  consist  of  three  high  quality 
microphones.  (Although  the  human  head  uses  only  two 
auditory  inputs,  it  relies  heavily  on  the  shape  of  the  ex¬ 
ternal  ear  in  determining  the  vertical  component  of  di¬ 
rectional  sound  source.)  Sound  generation  will  be  ac¬ 
complished  using  a  speaker. 

Sound  is  critical  for  several  aspects  of  the  robot's  ac¬ 
tivity.  first.  sound  provides  immediate  feedback  for  mo¬ 
tor  manipulation  and  positioning.  Babies  learn  to  find 
and  use  their  hands  liy  batting  at  and  manipulating  toys 
that  jingle  and  rattle.  Adults  use  such  cues  as  contact 
noises  the  sound  of  an  object  hitting  tile  table  to  pro¬ 
vide  feedback  to  motor  systems.  Second,  sound  aids 
in  socialization  even  before  the  emergence  of  language. 
Patterns  such  as  turn-taking  and  mimicry  are  critical 
parts  of  children's  development,  and  adults  use  guttural 
gestures  to  express  attitudes  and  other  conversational 
cues.  Certain  signal  tones  indicate  encouragement  or 
disapproval  to  all  ages  and  stages  of  development,  f  i¬ 
nally.  even  pre-verbal  children  use  sound  effectively  to 
convey  intent;  until  our  robots  develop  true  language, 
other  sounds  will  necessarily  he  a  major  source  of  com¬ 
munication. 

5.2.3  Torsos 

In  order  for  the  humanoid  to  be  able  to  participate  in 
t  he  same  sorts  of  body  met  aphors  as  are  used  by  humans, 
it  needs  to  have  a  symmetric  human-like  torso.  It  needs 
to  be  able  to  experience  imbalance,  feel  symmetry,  learn 
to  coordinate  head  and  body  motion  for  stable  vision, 
and  he  aide  to  experience  relief  when  it  relaxes  its  body  . 


Additionally  the  torso  must  he  able  to  support  the  head, 
the  arms,  and  any  objects  they  grasp. 

The  torsos  we  build  will  initially  have  a  three  degree 
of  freedom  hip.  with  the  axes  passing  through  a  common 
point,  capable  of  leaning  and  twisting  to  any  position  in 
about  three  seconds  somewhat  slower  than  a  human. 
The  neck  will  also  have  three  degrees  of  freedom,  with 
the  axes  passing  through  a  common  point  which  will  also 
lie  along  the  spinal  axis  of  the  body.  The  head  will  be 
capable  of  yawing  at  90  degrees  per  second  less  than 
peak  human  speed,  but  well  within  the  range  of  natural 
human  motions.  As  we  build  later  versions  we  expert  to 
increase  these  performance  figures  to  more  closely  match 
the  abilities  of  a  human. 

Apart  from  the  normal  sorts  of  kinematic  sensors,  the 
torso  needs  a  number  of  additional  sensors  specifically 
aimed  at  providing  input  fodder  for  the  development  of 
bodily  metaphors.  In  particular,  strain  gauges  on  the 
spine  can  give  the  system  a  feel  for  its  posture  and  the 
symmetry  of  a  particular  configuration,  plus  a  little  in¬ 
formation  about  any  additional  load  the  torso  might  bear 
when  an  arm  picks  up  something  heavy.  Heat  sensors  on 
the  motors  and  the  motor  drivers  will  give  feedback  as 
to  how  much  work  has  been  done  by  the  body  recently, 
and  current  sensors  on  the  motors  will  give  an  indication 
of  how  hard  the  system  is  working  instantaneously. 

Our  zero-th  level  torso  is  roughly  inches  from  the 
base  of  the  spine  to  the  base  of  the  neck.  This  corre¬ 
sponds  to  a  smallish  adult.  It  uses  DO  motors  with  built 
in  gearboxes.  The  main  concern  we  have  is  how  quiet  it 
will  be,  as  we  do  not  want  the  sound  perception  system 
to  be  overwhelmed  by  body  noise. 

Later  versions  of  the  torsos  will  have  touch  sensors 
integrated  around  the  body,  will  have  more  compliant 
motion,  will  be  quieter,  and  will  need  to  provide  better 
cabling  ducts  so  that  the  cables  can  all  feed  out  through 
a  lower  body  outlet. 

5.2.4  Arms 

The  eventual  manipulator  system  will  be  a  compliant 
multi-degree  of  freedom  arm  with  a  rather  simple  hand. 
(A  better  hand  would  be  nice,  but  hand  research  is  not 
yet  at  a  point  where  we  can  get  an  interesting,  easy- to 
use.  off-the-shelf  hand.)  The  arm  will  be  sab'  enough 
that  humans  can  interact  with  it.  handing  it  things  and 
taking  things  from  it.  The  arm  will  be  compliant  enough 
that  the  system  will  be  able  to  explore  its  own  body 
for  instance,  by  touching  its  head  system  so  that  it  will 
be  able  to  develop  its  own  body  metaphors.  The  full 
design  of  tile  even  the  first  pair  of  arms  is  not  yet  com¬ 
pletely  worked  out.  and  current  funding  does  not  permit 
the  inclusion  of  arms  on  the  zero-th  level  humanoid.  In 
this  section,  we  describe  our  desiderata  for  the  arms  and 
hands. 

We  want  the  arms  to  be  very  compliant  yet  still  able 
to  lift  weights  of  a  few  pounds  so  that  they  can  interact 
with  human  artifacts  in  interesting  wars.  Additionally 
we  want  tin-  arms  to  have  redundant  degrees  of  freedom 
(rather  than  the  six  seen  in  a  standard  commercial  robot 
arm),  so  that  in  many  circumstances  we  can  burn  some 
of  those  degrees  of  freedom  in  order  to  align  a  single 


joint  so  that  the  joint  coordinates  and  task  coordinates 
very  nearly  match.  This  will  greatly  simplify  control  of 
manipulation.  It  is  the  sort  of  thing  people  do  all  tin 
time:  for  example,  when  bracim1  an  elbow  or  tie  base 
of  the  palm  (or  even  their  middle  and  last  two  lingers) 
on  a  table  to  stabilize  the  hand  during  some  delicate  (or 
not  so  delicate)  manipulation. 

file  hands  in  the  lirst  instances  will  be  quite  simple: 
devices  that  can  grasp  from  above  relying  heavily  on 
mechanical  compliance  they  may  have  as  few  as  one 
degree  of  control  freedom. 

More  sophisticated,  however,  will  be  the  sensing  on 
the  arms  and  hands.  \\V  will  use  forms  of  conductive 
rubber  to  get  a  sense  of  touch  over  the  surface  of  the 
arm.  so  that  it  can  detect  (compliant)  collisions  it  might 
participate  in.  As  with  the  torso  there  will  be  liberal  use 
of  strain  gauges,  heat  sensors  and  current  sensors  so  that 
tin'  system  can  have  a  "feel  for  how  its  arms  are  being 
used  and  how  they  are  performing. 

We  also  expect  to  move  towards  a  more  sophisticated 
type  of  hand  in  later  years  of  this  project.  Initially,  un¬ 
fortunately.  we  will  be  forced  to  use  motions  of  the  upper 
joints  of  the  arm  for  line  manipulation  tasks.  More  so¬ 
phisticated  hands  will  allow  us  to  use  finger  motions, 
with  much  lower  inertias,  to  carry  out  these  tasks. 

6  Development  Plan 

We  plan  on  modeling  the  brain  at  a  level  above  tin  neural 
level,  but  below  what  would  normally  be  thought  of  as 
the  cognitive  level. 

We  understand  abstraction  well  enough  to  know  how 
to  engineer  a  system  t  hat  has  similar  proper!  ies  and  con¬ 
nect  ions  to  the  human  brain  without  having  to  model  its 
detailed  local  wiring.  At  the  same  time  it  is  dear  from 
the  literature  that  there  is  no  agreement  on  how  things 
are  really  organized  computationally  at  higher  or  modu¬ 
lar  levels,  or  indeed  whether  it  even  makes  sense  to  talk 
about  modules  of  the  brain  (e.g.,  short  term  memory, 
and  long  term  memory)  as  generative  structures. 

Nevertheless,  we  expect  to  In'  guided,  or  one  might 
say  inspired,  by  what  is  known  about  the  high  level  con¬ 
nectivity  within  the  human  brain  (although  admittedly 
much  of  our  knowledge  actually  comes  from  macaques 
and  other  primates  and  is  only  extrapolated  to  be  true 
of  humans,  a  problem  of  concern  to  some  brain  scien¬ 
tists  (('rick  <V  Jones  199:5)).  Thus  for  instance  we  ex¬ 
pect  to  have  identifiable  clusters  of  processors  which  we 
will  be  able  to  point  to  and  say  they  are  performing  a 
role  similar  to  that  of  the  cerebellum  (e.g..  refining  gross 
motor  commands  into  coordinated  smooth  motions),  or 
the  cortex  (e.g..  some  aspects  of  searching  generaliza¬ 
tion/specialization  hierarchies  in  object  recognition  (f  li¬ 
man  1991)). 

At  another  level  we  will  directly  model  human  sys¬ 
tems  where  they  are  known  in  some  detail.  Lor  instance 
there  is  quite  a  lot  known  about  the  control  of  eye  move¬ 
ments  in  humans  (again  mostly  extrapolated  from  work 
with  monkeys)  and  we  will  build  in  a  vestibulo-ocular  re¬ 
sponse  (VOK).  OKR.  smooth  pursuit,  and  saccades  us¬ 
ing  the  best  evidence  available  on  how  this  is  organized 
in  humans  (Lisberger  l9Nt<). 


A  third  level  of  modeling;  or  inspiration  that  we  will 
use  is  at  the  developmental  level.  For  instance  once 
we  have  some  sound  understanding,  developed,  we  will 
use  models  of  what  happens  in  child  language  develop¬ 
ment  to  explore  ways  of  connecting  physical  actions  in 
the  world  to  a  ground  of  language  and  the  development 
of  symbols  (Bates  1970).  (Bates.  Bretherton  .A  Snyder 
1988),  including  indexical  (Lempert  .A  Kinshourne  108')) 
and  turn-taking  behavior,  interpretation  of  tone  and  fa¬ 
cial  expressions  and  the  early  use  of  memorized  phrases 
Since  we  will  have  a  number  of  faculty,  post-doctoral 
fellows,  and  graduate  students  working  on  concurrent 
research  projects,  and  since  we  will  have  a  number  of 
concurrently  active  humanoid  robots  not  all  pieces  that 
are  developed  vvill  be  intended  to  lit  together  exactly. 
Some  will  be  incompatible  experiments  in  alternate  ways 
of  building  subsystems,  or  putting  them  together.  Some- 
will  be  pushing  on  particular  issues  in  language,  say.  that 
may  not  be  very  related  to  some  particular  other  issues, 
e.g.,  saceade,s.  Also,  quite  clearly,  at  this  stage  we  can 
not  have  a  development  plan  fully  worked  out  for  five 
years,  as  many  of  the  early  results  will  change  the  way 
we  think  about  the  problems  and  what  should  he  the 
next  steps. 

In  figure  1,  we  summarize  our  current  plans  for  devel¬ 
oping  software  systems  on  board  our  series  of  humanoids. 
In  many  cases  there  will  he  earlier  work  off-board  the 
robots,  hut  to  keep  clutter  down  in  the  diagram  we  have 
omitted  that  work  here. 
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